Search CORE

21 research outputs found

GPU Accelerated Color Correction and Frame Warping for Real-time Video Stitching

Author: Bai Xinyi
Cheng Hong
Kong Zhenglun
Li Ting
Lin Zhiye
Yang Lu
Publication venue
Publication date: 17/08/2023
Field of study

Traditional image stitching focuses on a single panorama frame without considering the spatial-temporal consistency in videos. The straightforward image stitching approach will cause temporal flicking and color inconstancy when it is applied to the video stitching task. Besides, inaccurate camera parameters will cause artifacts in the image warping. In this paper, we propose a real-time system to stitch multiple video sequences into a panoramic video, which is based on GPU accelerated color correction and frame warping without accurate camera parameters. We extend the traditional 2D-Matrix (2D-M) color correction approach and a present spatio-temporal 3D-Matrix (3D-M) color correction method for the overlap local regions with online color balancing using a piecewise function on global frames. Furthermore, we use pairwise homography matrices given by coarse camera calibration for global warping followed by accurate local warping based on the optical flow. Experimental results show that our system can generate highquality panorama videos in real time

arXiv.org e-Print Archive

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

Author: Kong Zhenglun
Li Sheng
Li Yanyu
Ren Jian
Tang Xulong
Tulyakov Sergey
Wang Yanzhi
Yuan Geng
Publication venue
Publication date: 22/09/2022
Field of study

Recently, sparse training has emerged as a promising paradigm for efficient deep learning on edge devices. The current research mainly devotes efforts to reducing training costs by further increasing model sparsity. However, increasing sparsity is not always ideal since it will inevitably introduce severe accuracy degradation at an extremely high sparsity level. This paper intends to explore other possible directions to effectively and efficiently reduce sparse training costs while preserving accuracy. To this end, we investigate two techniques, namely, layer freezing and data sieving. First, the layer freezing approach has shown its success in dense model training and fine-tuning, yet it has never been adopted in the sparse training domain. Nevertheless, the unique characteristics of sparse training may hinder the incorporation of layer freezing techniques. Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs. Second, we propose a data sieving method for dataset-efficient training, which further reduces training costs by ensuring only a partial dataset is used throughout the entire training process. We show that both techniques can be well incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.Comment: Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

Author: Ding Caiwen
Kong Zhenglun
Li Yao
Liang Yi
Tang Shengkun
Wang Yanzhi
Wang Yaqing
Xu Dongkuan
Zhang Tianchi
Publication venue
Publication date: 20/11/2022
Field of study

Large-scale Transformer models bring significant improvements for various downstream vision language tasks with a unified architecture. The performance improvements come with increasing model size, resulting in slow inference speed and increased cost for severing. While some certain predictions benefit from the full complexity of the large-scale model, not all of inputs need the same amount of computation to conduct, potentially leading to computation resource waste. To handle this challenge, early exiting is proposed to adaptively allocate computational power in term of input complexity to improve inference efficiency. The existing early exiting strategies usually adopt output confidence based on intermediate layers as a proxy of input complexity to incur the decision of skipping following layers. However, such strategies cannot apply to encoder in the widely-used unified architecture with both encoder and decoder due to difficulty of output confidence estimation in the encoder. It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely \textbf{MuE}. By decomposing the image and text modalities in the encoder, MuE is flexible and can skip different layers in term of modalities, advancing the inference efficiency while minimizing performance drop. Experiments on the SNLI-VE and MS COCO datasets show that the proposed approach MuE can reduce expected inference time by up to 50\% and 40\% while maintaining 99\% and 96\% performance respectively

arXiv.org e-Print Archive

The Lottery Ticket Hypothesis for Vision Transformers

Author: Dong Peiyan
Kong Zhenglun
Ma Xiaolong
Meng Xin
Qin Minghai
Shen Xuan
Tang Hao
Wang Yanzhi
Yuan Geng
Publication venue
Publication date: 02/11/2022
Field of study

The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input images consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input. Furthermore, we present a simple yet effective method to find the winning tickets in input patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we use a ticket selector to generate the winning tickets based on the informativeness of patches. Meanwhile, we build another randomly selected subset of patches for comparison, and the experiments show that there is clear difference between the performance of models trained with winning tickets and randomly selected subsets

arXiv.org e-Print Archive

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

Author: Chen Tianlong
Dong Peiyan
Kong Zhenglun
Ma Haoyu
Ma Xiaolong
Meng Xin
Qin Minghai
Shen Xuan
Sun Mengshu
Tang Hao
Wang Yanzhi
Wang Zhangyang
Xie Xiaohui
Xie Yanyue
Yuan Geng
Publication venue
Publication date: 19/11/2022
Field of study

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Comparing the Primary and Recall Immune Response Induced by a New EV71 Vaccine Using Systems Biology Approaches

Author: Fengcai Zhu (354575)
Jie Shao (418495)
Junnan Zhang (813055)
Junzhi Wang (110334)
Miao Xu (130624)
Pan Chen (385477)
Qunying Mao (130602)
Wei Kong (294135)
Xing Wu (328714)
Zhenglun Liang (130633)
Publication venue
Publication date: 14/10/2015
Field of study

<div><p>Three inactivated EV71 whole-virus vaccines have completed Phase III clinical trials in mainland China, with high efficacy, satisfactory safety, and sustained immunogenicity. However, the molecular mechanisms how this new vaccine elicit potent immune response remain poorly understood. To characterize the primary and recall responses to EV71 vaccines, PBMC from 19 recipients before and after vaccination with EV71 vaccine are collected and their gene expression signatures after stimulation with EV71 antigen were compared. The results showed that primary and recall response to EV71 antigen have both activated an IRF7 regulating type I interferon and antiviral immune response network. However, up-regulated genes involved in T cell activation regulated by IRF1, inflammatory response, B-cell activation and humoral immune response were only observed in recall response. The specific secretion of IL-10 in primary response and IL-2,IP-10,CCL14a, CCL21 in recall response was consistent with the activation of immune response process found in genes. Furthermore, the expression of MX1 and secretion of IP-10 in recall response were strongly correlated with NTAb level at 180d after vaccination (r = 0.81 and 0.99). In summary, inflammatory response, adaptive immune response and a stronger antiviral response were indentified in recall response.</p></div

FigShare

Heat map of DEGs in primary and recall response.

Author: Fengcai Zhu (354575)
Jie Shao (418495)
Junnan Zhang (813055)
Junzhi Wang (110334)
Miao Xu (130624)
Pan Chen (385477)
Qunying Mao (130602)
Wei Kong (294135)
Xing Wu (328714)
Zhenglun Liang (130633)
Publication venue
Publication date
Field of study

<p>Colors ranging from blue to red corresponded represent the DEGs’ average fold change among the subjects (n = 19). (a) Common genes identified in primary and recall response. However, the fold change of these genes in recall response was higher than that in primary response. (b) Pathways that were only observed in recall response,including inflammatory response, antigen processing and presentation, B cell activation, T cell activation and humoral immune response.</p

FigShare